Polymorphisms in COL4A3 and COL4A4 genes associated with keratoconus.

PURPOSE
Alterations in collagen type IV, alpha-3 (COL4A3) and collagen type IV, alpha-4 (COL4A4) genes may be responsible for a decrease in collagen types I and III, a feature often detected in keratoconus (KC). To evaluate the significance of alterations in COL4A3 and COL4A4 genes in KC patients, we screened both genes and estimated the significance of polymorphisms in Slovenian patients with KC.


METHODS
The study included 104 unrelated patients with KC and 157 healthy blood donors. Diagnosis was established by clinical examination, electronic refractometry, and keratometry. DNA was extracted from blood, and gene exons were amplified by PCR. Non-isotopic high-resolution single-stranded conformation analysis (SSCA) was used to screen COL4A3 and COL4A4 genes, and migration shifts detected by SSCA were subsequently sequenced. For statistical evaluation, control blood donors were chosen according to age, sex, and not having blood relationship. Neither patients nor control blood donors chosen for statistical analysis were in blood relationship. We used Fisher's exact test for statistical evaluation, with p<0.05 considered significant.


RESULTS
We detected eight polymorphisms in the COL4A3 gene and six in the COL4A4 gene. Allele differences in D326Y in COL4A3 and M1237V and F1644F in COL4A4 are significantly distinctive of KC patients (Fisher's exact test, p<0.05). When analyzing different genotypes under three models (dominant, recessive, and additive), we established that P141L, D326Y, and G895G in COL4A3 and P482S, M1327V, V1516V, and F1644F in COL4A4 have significant differences in genotype distribution between KC patients and the control group.


CONCLUSIONS
This is the first mutational screening of COL4A3 and COL4A4 genes in KC patients to establish the status of these genes and compare them to a control population. Analysis of COL4A3 and COL4A4 revealed no mutations related to KC patients, but specific genotypes of seven previously described polymorphisms are significantly associated with KC under dominant, recessive, or additive models. Differences in the expression of type IV collagen in previously published data about chromosomal instabilities in the regions in which the analyzed genes were mapped and our data indicate a probability that some of the polymorphisms we detected could be related to KC.

alterations of the extra cellular matrix and basement membrane are characterized mostly by a decrease in types I and III [9]. The changes in the orientation of collagen molecules, which are followed by rearrangement of collagen fibrils, also alter the shape and transparency of the cornea [10,11]. A knockout mouse model has shown that disruption of the genes encoding α1 (COL8A1) and α2 chains (COL8A2) of type VIII collagen leads to structural changes similar to the clinical presentation of keratoglobus [12]. KC has not been associated with mutations in type VIII collagen genes [13], although a relation between COL8A2 mutations and dystrophic corneal disorders has previously been reported [14,15]. Results from imunohistochemistry, in situ hybridization, and expression arrays show that several other types of collagen are differentially expressed and have an active role in wound healing processes. Collagen molecules that are differentially expressed in keratoconus corneas are types XII, XIII, XVIII, and XV, but there are no known relations between mutations and expression levels for those genes [16,17]. Upregulation of collagen type XV and downregulation of collagen type IV in KC corneas, observed by Bochert et al. [18] and Stachs et al. [19], showed the putative role of those types of collagen in KC. Types XIII, XV, and XVIII collagen were found to be expressed in basal corneal cells and may have a role in the adhesion of the corneal epithelial cells to each other and to the underlying basement membrane [16,19].
Type IV collagen is found only in basement membranes where it is the major structural component. Mariyama et al. [20] mapped the collagen type IV, alpha-3 (COL4A3) and collagen type IV, alpha-4 (COL4A4) genes to the same region, 2q35-q37, but on opposite strands and transcribed in opposite directions [21]. The COL4A3 gene spans 250 kb and consists of 51 exons; the COL4A4 gene is shorter, spanning 113 kb and consisting of 48 exons [20,22]. COL4A3 and COL4A4 are two of six α chains that form heterotrimeric type IV collagen molecules [20,23,24]. Type IV collagen is expressed in corneas and implicated in Goodpasture and Alport syndromes, which are often accompanied by eye abnormalities, but their involvement in eye disorders is still unknown [22,[24][25][26]. COL4A3 has already been implicated in the pathogenesis of polymorphous corneal dystrophy-3 [27,28], and both genes are reported to be differentially expressed in keratoconus corneas [18,19]. Results from the study published by Stachs et al., favored collagen type IV as a candidate gene in keratoconus pathogenesis [19]. Because a change in the expression levels of collagen type IV α-3 and α-4 chains were observed in corneas affected by KC, we investigated whether there are alterations in COL4A3 and COL4A4 related to KC patients.

Patients:
The genetic study included 104 unrelated patients with KC and 157 healthy blood donors as a control. After examination of the patients (clinical examination, electronic refractometry, and keratometry) and precise personal anamnesis, an unrelated cohort of patients diagnosed with KC was selected for this study. We excluded patients with other ocular diseases that could influence the interpretation of the results: blepharoconjunctivitis, keratitis, opacifications of the lens, changes of the macula, and cup/disc ratio (C/D) of the optic nerve of 0.3 or more. One hundred and four patients, 65 males and 39 females, were included in this study after informed consent had been obtained and after determination of the diagnostic and other criteria. All the patients included in the study had no other diagnosed disease. The patients' ages were from 20 to 67 years (mean±standard deviation [SD] 39.1±8.2 years). For the control population we used peripheral blood taken from 157 blood donors collected at the Blood Transfusion Centre of Slovenia (57 women, 100 men; mean age ± SD 37.2±10.2 years). Blood samples from patients with KC and from healthy Slovenian blood donors were in the form of anticoagulated blood. Blood from KC patients and controls was obtained from the median cubital vein, on the anterior forearm in 3 ml vacuum blood collection tubes with EDTA K3 (Laboratorijska tehnika). Blood was stored in collection tubes at -20 °C until the DNA was isolated. The control group was selected on the basis of age, nationality, and gender comparable with the KC patients. There were no blood relations among individuals in the control group or between individuals in the control group and individuals in the KC group, and control individuals had not been diagnosed with KC. The National Medical Ethics Committee of the Republic of Slovenia approved the study. DNA extraction and mutational screening: Genomic DNA was isolated from peripheral blood lymphocytes by salt precipitation. After the blood samples were thawed, salinesodium citrate buffer (Merck) was added, mixed on Vibromix (Tehtnica), and the samples centrifuged (12,000 rpm for one minute, centrifuge 5415R; Eppendorf). The top portion of the supernatant was discarded and saline-sodium citrate buffer (Merck) was added, mixed, and again mixture centrifuged under same conditions. Then was supernatant discarded and pellet re-suspended in a solution of sodium dodecyl sulfate detergent (10 % SDS; Sigma-Aldrich) and 5 µl of proteinase K (20 mg/ml H2O; Sigma-Aldrich). The mixture was incubated at 55 °C for 1 h (Thermomixer comfort; Eppendorf). After incubation was DNA treated with a phenol/chloroform/ isoamyl alcohol solution in ratio 25:24:1 (Sigma-Aldrich). After centrifugation (12,000 rpm for 1 min, centrifuge 5415R; Eppendorf) was the aqueous layer removed to a new micro centrifuge tube (Costar) and two consecutive DNA ethanol precipitations followed; first one with 100 % and second one with 80 % ethanol (Merck). DNA was re-suspended in 10:1 Tris-EDTA buffer (Sigma-Aldrich) between both precipitations. After the second precipitation the pellets were dried at room temperature followed by addition of 10:1 Tris-EDTA buffer (Sigma-Aldrich). The mixture of DNA and Tris-EDTA buffer (Sigma-Aldrich) was re-suspended with mixing and incubation at 55 °C overnight (Thermomixer comfort; Eppendorf). Amplifications of COL4A3 and COL4A4 were performed by PCR. For the PCR reaction we used the primers (Operon) previously described by Heidet et al. [29] (COL4A3; Table 1) and Boye et al. [22] (COL4A4; Table 2).
Screening for changes in PCR products was performed with single-stranded conformation analysis (SSCA) for each PCR fragment of a given set of samples from patients and healthy blood donors. Large glass plates (35×40 cm) were used to obtain maximum sensitivity. The shorter plate was coated with Repel-Silane (Merck). The longer plate was coated with Bind-Silane (20 ml of gmethacryloxypropyltrimethoxysilane, 5 ml of bi-distilled H2O, 5 ml of 100 % ethanol; Merck), then warmed to 50 °C for about 30 min and cooled to room temperature. A 3 ml  [30] with most phases at 55°C. Silver staining (Merck) was performed on thin gels (0.4 mm) fixed on the larger glass plate. Samples with different migration shifts were chosen for sequencing, which was done with a BigDye Terminator Ready Reaction Mix (Applied Biosystems). Sequences were purified, dissolved, and analyzed on an ABI PRISM 310 Genetic Analyzer (Applied Biosystems; Figure 1 and Figure 2).

Deviations of the Hardy-Weinberg equilibrium:
Deviations of the Hardy-Weinberg (H-W) equilibrium were calculated with the χ 2 online test with 1 degree of freedom (DF=1) for each In the table, Length represents length of the PCR product in base pairs (bp) and Annealing temp represents the annealing temperature of the primers used for PCR reactions.
polymorphism found in KC patients and control groups. By the use of a χ 2 table, with DF=1, the limits for maintaining a null hypothesis (that the observed data has Hardy-Weinberg proportions) were obtained. If the result equaled or was less than 0.05 (5% limits), we concluded that there was no statistical deviation from the Hardy-Weinberg equilibrium in our data (Table 3).
Associations between allele and genotype frequencies: The magnitudes and directions of associations between the polymorphisms found and KC patients were determined using Fisher's exact test with a two-sided p value. Fisher's exact test was chosen because it is based on exact probabilities from a specific distribution and is the preferred tool over the χ 2 when comparing small data samples and a large sample approximation would be inappropriate. A two-sided p value was calculated to determine the significance of the relationship, and a value of p<0.05 was considered statistically significant. Significant relationships for each allele or genotype group between KC patients and the control group are summarized as odds ratio (OR) and relative risk (RR; Table 4). The significance of genotype frequencies for each polymorphism found in the two-tested groups (patients and controls) was tested in two models, dominant and recessive. A dominant model was constructed on the basis of a presumption that at least one allele would be changed. We therefore combined the number of heterozygous genotypes with the number of homozygous genotypes for each polymorphism genotype and analyzed whether the representation of genotypes was significantly different between cases and controls for each polymorphism (Table 5). A recessive model was constructed on the basis of a presumption that both alleles would be changed. Therefore the number of homozygous genotypes against combined heterozygous and homozygous genotypes for another allele were compared for each polymorphism and whether the representation of genotypes was significantly different between cases and controls for each polymorphism was analyzed (Table 5). An additive model was constructed to test the significances between KC patients and the control group for all genotypes in detected polymorphisms (Table 6). For statistics we used the Fisher's exact test, and when the twosided value was less than 0.05, the results were summarized as an OR and RR. All statistical analyses were performed using SPSS ver.14 (SPSS Inc.).
SIFT and PolyPhen predictions for polymorphisms causing amino acid substitution: The potential impact of polymorphisms causing amino acid substitution was assessed with two analytic tools: SIFT and PolyPhen. SIFT is a sequence homology-based tool that sorts intolerant from tolerant amino acid substitutions and predicts whether an amino acid substitution in a protein will have a phenotypic effect. SIFT is based on the premise that protein evolution is correlated with protein function. Positions important for function should be conserved in the alignment of the protein family, whereas unimportant positions should appear diverse in the alignment. The SIFT tool calculates a score for the amino acid substitution, and a score lower than 0.05 is considered potentially damaging (Table 7). PolyPhen (Brigham and Women's Hospital, Harvard Medical School) is a tool for predicting the possible impact of an amino acid substitution on the structure and function of a human protein. This prediction is based on straightforward empirical rules, which are applied to the sequence, phylogenetic, and structural information characterizing the substitution. The PolyPhen tool uses Position-Specific Independent Counts software to calculate profile scores obtained from the likelihood of a given amino acid occurring at a position of interest compared to background frequencies (the likelihood of this amino acid occurring at any position; Table 7).

Mutational analysis:
Mutational analysis of all exons in COL4A3 and COL4A4 genes did not reveal any mutations in KC patients. We detected eight polymorphisms in COL4A3, six of them amino substitutions (G43R, P141L, E162G, D326Y, H451R, and P574L), and six polymorphisms in COL4A4, three of them amino acid substitutions (P482S, G545A, and M1327V; Table 3, Figure 1 and Figure 2). All of the polymorphisms were also detected in the healthy population and have previously been described as showen in Table 3. Hardy-Weinberg equilibrium: When analyzing the H-W equilibrium, we discovered that the frequencies of most of the polymorphisms discovered deviate from expected numbers in both KC patients and controls. In the COL4A3 gene, only three (P141L, D326Y, and G895G) polymorphisms in the control group and two (D326Y and P574L) polymorphisms in the KC patient group had a p value less than the 5% limit, which was the cut-off value for determining no statistical deviation from the H-W equilibrium. In COL4A4, the observed frequencies of three polymorphisms (P428S, M1327V, and V1516V) in the control group and two (G789G and M1327V) in the KC patient group did not deviate from the H-W equilibrium (Table  3).

Associations between allele and genotype frequencies:
The allele frequency in three polymorphisms was significantly associated with KC patients (Table 4). P141L, D326Y, and G895G in COL4A3 and P482S, M1327V, V1516V, and F1644F in COL4A4 polymorphisms were associated with KC patients, either as genotypes or alleles, with calculated p values less than 0.05 (Fisher's exact test; Table 4, Table 5, and  (Table 4). When analyzing the representation of genotypes for all the polymorphisms found between KC patients and controls, we discovered that some of the genotypes were significantly represented only in the KC patient group (Table 4). The analysis was performed in relation to the representation of mandatory both (recessive) or at least one allele (dominant) being changed.
We also discovered through analysis of both models that some of the genotypes were significantly less frequent in KC patients: 976TT and 2685AA in COL4A3 and 1444CC and 4932 TT in COL4A4 for the dominant model; 422TT and 976TT in COL4A3 and 3979AA, 4548GG, and 4932TT in COL4A4 for the recessive model (Table 5). In the additive model, genotypes 422CC, 422TT, 422CT, 976GG, 976TT, and 2685AA in COL4A3 and 1444CT, 3979AA, 3979GG, 4548AG, 4932CC, and 4932CT in COL4A4 were significantly different between KC patients and the control group (Table 6). SIFT and PolyPhen predictions: PolyPhen analysis predicted that G43R, P141L, D326Y, and P574L polymorphisms in the COL4A3 gene are potentially damaging. All tested missense polymorphisms in COL4A4 are predicted to be benign. SIFT tool analysis gave a score less than 0.05 for G43R in COL4A3 and G545A in COL4A4. Those substitutions are predicted to affect the protein function and would not be tolerated. All other substitutions are predicted as tolerated (Table 7).

DISCUSSION
To our knowledge this is the first report describing the genetic screening of two type IV collagen genes in KC patients. Frequent polymorphisms in affected and healthy populations were found, but no mutations in either of the genes that could be related to KC were discovered. Previous data have revealed that the expression of type IV collagen is deregulated in KC patients and that chromosome locations with genes important in the regulation of collagen synthesis (including type IV collagen) are frequently subjected to aneuploidy and translocation [18,31]. Given the identification of changed amounts of collagen and no affirmative data about relations between mutations in already researched collagen genes and KC, we analyzed the COL4A3 and COL4A4 genes, which are deregulated in KC patients, are often subjected to chromosomal aberrations, and could also be responsible for a decrease in collagen types I and III, a feature often detected in the disease [8,9,11,18,19,31].
All of the alterations found in both genes have already been published in other studies. When analyzing whether polymorphisms found were in H-W equilibrium, we discovered that most of them were not. It is difficult to speculate the main reason for this, but some of the probable causes of population differences shown in the study are selection, small population size, population stratification, and genetic drift. It is not rare to find that polymorphisms are in H-W disequilibrium because of the above-mentioned reasons. The control group was selected as described in the Methods section. Considering the small number of some of the alleles found, it is easy to predict that a larger sample size of controls and inclusion of different nationalities and races could help to meet the criteria for H-W equilibrium, but the allele  [34]. Hardy-Weinberg CHI (p-value): calculated chi values according to our data for cases and controls separately, and deviation between observed and expected numbers. When p-value equals or is less than 0.05 (5%) limit, then there is no statistical deviation from the Hardy-Weinberg equilibrium.
frequencies described in our study are comparable to ones found by Šlajpah et al. [32]. Even though obvious violations to the H-W equilibrium were detected, genotypes and allele representations for some polymorphisms statistically differ between groups and are much more frequent in the KC patients than in the healthy population, which should be taken into consideration when assessing differences between genotypes and phenotypes for a chosen population.
For predicting the effect of substitutions found, we used two different tools, PolyPhen and SIFT, which predict the possible impact on the structure and function of protein substitutions. COL4A3 G43R, P141L, D326Y, and P574L polymorphisms were predicted to have an effect when analyzed with PolyPhen, but SIFT predicted that only G43R would be damaging. Out of all the substitutions found in COL4A4, only G545A was predicted by SIFT to be damaging. Discrepancies between predictions using different tools are expected because the matrices and nature of assessing the damaging effects are based differently. PolyPhen predicts the functional effect of substitutions by determining the level of sequence conservation between homologous genes over evolutionary time, the properties of the exchanged residues, and the proximity of the substitution to predicted putative protein domains and structural features within the protein.
SIFT predicts the functional importance of an amino acid substitution based on the alignment of highly similar protein sequences. Predictions rely on whether or not an amino acid at the position of our interest is conserved in the protein family, which can be indicative of its importance to the normal function or structure of the expressed protein. Not all substitutions predicted to affect protein function are involved in disease development and/or progression, especially in the complex diseases, such as KC. Still in the absence of functional data, it is advantageous to use predictive tools to identify substitutions that would more likely affect wild-type protein function; nevertheless differences in results using SIFT and PolyPhen predict D326Y to be damaging and the substitution could have an effect on the structure and function of the protein. The other two alleles significant for KC patients were found in the COL4A4 gene, resulting in one missense and one silent alteration (3979G, M1327V and 4932C, F1644F), although substitution is predicted to be benign and tolerated. When comparing genotypes, we discovered specific genotypes related to KC patients even though the allele distribution was not significantly different. Under different models (dominant, recessive, and additive), we found a significant representation of the following genotypes: 422CC, 422TT, 422CT, and 2685CC in COL4A3 and 1444TT, 4548AA, and 4548AG in COL4A4. The prediction tools used showed the possibility that some of the substitution resulting from these genotypes could be damaging (Table 7). In order to conclude whether genotype representations are specific for our population or are in fact disease specific, different populations should be examined and data compared. In view of the lack of mutations, we could speculate that mutations in collagen type IV (COL4A3 and COL4A4 genes) are not involved in KC disease and that other genes and factors are involved in the pathogenesis of this disorder, but functional assay would be required to clarify this speculation. This study established that significant relationships between KC patients and different genotypes in COL4A3 and COL4A4 exist, so the significance of the genotypes should be established by further analysis that would involve different populations. There is a possibility that some of the polymorphisms could be related to KC, a feature that could be used in helping the determination of the molecular genetics of the disease. The PolyPhen (Polymorphism Phenotyping) tool predicts the possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. The SIFT (Sorting Intolerant From Tolerant) tool predicts whether an amino acid substitution will affect the protein function; based on sequence homology and the physical properties of amino acids it calculates the potential impact of the amino acid change (score lower than 0.05 is considered potentially damaging).